Introducing sdmTMB

Aqua skill-sharing sessions

Max Lindmark, Agnes Olin, Astrid Carlsen

2025-02-03

Why is this so great?

  • Spatial and spatiotemporal random effects for modelling spatially-structured latent variables
  • All standard families including Tweedie, student, delta/hurdle models
  • Option to sample with Stan for Bayesian inference
  • Easily determine fit and convergence with diagnostic and residuals
  • Built-in index standardization
  • Extremely active and helpful developers

Why is this so great?

  • Tons of model specifications available

Today’s session

  • Super brief intro to the approach
  • 3 case studies

Why spatial models?

  • Spatial autocorreltion: correlation depends on distance.
    • No concern!
  • Residual spatial autocorreltion: Big concern!
  • Violates statistical assumptions
  • Inference doesn’t have to be affected by RSA, but they can be so we need to adress this

Why spatial models?

  • Spatial autocorrelation emerges in ecological data because of drivers acting on large spatial scales
  • Include the covariates you believe generate the patters, including proxys
  • Look at a plot of residuals in space, still a problem?
    • (very likely yes)

Gaussian Random Fields (GRFs)

  • You can include a smooth surface to model the residuals patterns, e.g., with s()
  • GRFs are nice because they are efficient and provide interesting information
  • A way of estimating a wiggly surface that represents unmeasured variables

Gaussian Markov Random Fields (GRMFs) & the SPDE approach

  • GRFs are computationally demanding
  • Instead represent the spatial field as a solution to a partial differential equation (SPDE)
  • Solving this SPDE requires triangulation of the domain
  • This results in a Gaussian Markov Random field (GMRF)

Matérn

Meshes

  • The SPDE method involves a triangulation of the domain (mesh)
  • You need to construct this mesh, e.g., using fmesher::fm mesh 2d()

Meshes

Construct a mesh

mesh <- make_mesh(catch, xy_cols = c("X", "Y"), cutoff = 3)

Fit a non-spatial model

fit1 <- sdmTMB(no_km2 ~ year_f + survey,
               data = catch,
               family = delta_gamma(type = "poisson-link"),
               spatiotemporal = "off",
               spatial = "off",
               time = "year")

s <- sanity(fit1)
s
$hessian_ok
[1] TRUE

$eigen_values_ok
[1] TRUE

$nlminb_ok
[1] TRUE

$range_ok
[1] TRUE

$gradients_ok
[1] TRUE

$se_magnitude_ok
[1] TRUE

$se_na_ok
[1] TRUE

$sigmas_ok
[1] TRUE

$all_ok
[1] TRUE

Check residuals

res <- simulate(fit1, nsim = 1000, type = "mle-mvn") |>
  dharma_residuals(fit1, plot = FALSE)

ggplot(res, aes(observed, expected)) +
  geom_point(color = "grey30", shape = 21, size = 0.5) +
  geom_abline(col = "tomato3", linewidth = 0.6) +
  theme(aspect.ratio = 1) +
  labs(x = "Observed", y = "Expected")

Fit a spatiotemporal and compare residuals

Extract coefficicents

tidy(fit2)
# A tibble: 14 × 5
   term        estimate std.error conf.low conf.high
   <chr>          <dbl>     <dbl>    <dbl>     <dbl>
 1 (Intercept)    0.561     0.834   -1.07      2.19 
 2 year_f2013    -1.53      0.838   -3.18      0.111
 3 year_f2014    -2.33      1.00    -4.29     -0.374
 4 year_f2015    -0.672     0.735   -2.11      0.769
 5 year_f2016    -0.696     0.735   -2.14      0.745
 6 year_f2017    -0.805     0.740   -2.26      0.646
 7 year_f2018    -0.212     0.752   -1.69      1.26 
 8 year_f2019    -0.355     0.740   -1.81      1.10 
 9 year_f2020    -1.06      0.751   -2.53      0.417
10 year_f2021    -0.665     0.757   -2.15      0.820
11 year_f2022    -0.775     0.769   -2.28      0.732
12 year_f2023    -0.443     0.763   -1.94      1.05 
13 year_f2024    -1.00      0.767   -2.51      0.500
14 surveyIBTS     0.832     0.272    0.299     1.37 

Extract coefficicents

tidy(fit2, effects = "ran_pars", model = 1)
# A tibble: 3 × 5
  term    estimate std.error conf.low conf.high
  <chr>      <dbl>     <dbl>    <dbl>     <dbl>
1 range     24.6      14.7      7.62     79.3  
2 sigma_E    1.74      0.885    0.643     4.72 
3 rho        0.986    NA        0.938     0.997
tidy(fit2, effects = "ran_pars", model = 2)
# A tibble: 4 × 5
  term    estimate std.error conf.low conf.high
  <chr>      <dbl>     <dbl>    <dbl>     <dbl>
1 range      4.65      4.97     0.571    37.8  
2 phi        4.12      0.615    3.08      5.52 
3 sigma_E    0.791     0.591    0.183     3.42 
4 rho        0.754    NA       -0.173     0.973

Plot spatiotemporal random effects

  • First we need to create a spatial grid to predict on…

Plot spatiotemporal random effects

Check anisotropy

Case study 2: Cod diet data

  • Response variable is relative prey weight (g/g) that contains lots of zeroes
  • Long data (1 row = 1 observation of a prey group)
  • “Multispecies” model

Case study 2: Cod diet data

Construct a mesh

The model

m2 <- sdmTMB(
  rpw_2 ~ 0 + prey_group * pred_length_sc + s(year_ct, by = prey_group) +
    oxy_sc + temp_sc + depth_sc + salinity_sc,
  data = d,
  mesh = mesh,
  time = "year",
  extra_time = missing_years,
  spatial = "off",
  spatiotemporal = "iid",
  spatial_varying = ~ 0 + prey_group,
  family = tweedie()
  )

Plot residuals and compare with non spatial model

Spatial predictions

Calculate a relative “index of abundance”

# Calculate indices

inds <- list()

for (i in unique(d$prey_group)) {

  pred_grid$prey_group <- as.factor(i)
  p <- predict(m2, newdata = pred_grid, return_tmb_object = TRUE)
  t <- get_index(p, area = 1 / nrow(pred_grid |> filter(year == 2000)))
  inds[[i]] <- t |>
    mutate(
      prey_group = i
    )
  }

ind <- bind_rows(inds)

Plot indices

Further watching